视觉关系检测(VRD)促使计算机视觉模型“看到”超越单个对象实例,并“理解”场景中不同对象是如何相关的。 VRD的传统方式首先检测图像中的对象,然后单独预测检测到的对象实例之间的关系。这种不相交的方法很容易预测具有相似语义含义的同一对象对之间的冗余关系标签(即谓词),或者具有与地面真实含义相似但在语义上不正确的含义相似的语义含义。为了解决这个问题,我们建议共同训练具有视觉对象特征和语义关系特征的VRD模型。为此,我们提出了弗雷伯特(Vrebert),这是一种类似于伯特的变压器模型,用于通过多阶段训练策略进行视觉关系检测,以共同处理视觉和语义特征。我们表明,我们简单的类似BERT的模型能够超越谓词预测中最先进的VRD模型。此外,我们表明,通过使用预先训练的Vrebert模型,我们的模型通过明显的余量(+8.49 r@50和+8.99 R@100)推动了最新的零拍谓语预测。
translated by 谷歌翻译
表征过度参数化神经网络的显着概括性能仍然是一个开放的问题。在本文中,我们促进了将重点转移到初始化而不是神经结构或(随机)梯度下降的转变,以解释这种隐式的正则化。通过傅立叶镜头,我们得出了神经网络光谱偏置的一般结果,并表明神经网络的概括与它们的初始化密切相关。此外,我们在经验上使用实用的深层网络巩固了开发的理论见解。最后,我们反对有争议的平米尼猜想,并表明傅立叶分析为理解神经网络的概括提供了更可靠的框架。
translated by 谷歌翻译
This paper studies 3D dense shape correspondence, a key shape analysis application in computer vision and graphics. We introduce a novel hybrid geometric deep learning-based model that learns geometrically meaningful and discretization-independent features with a U-Net model as the primary node feature extraction module, followed by a successive spectral-based graph convolutional network. To create a diverse set of filters, we use anisotropic wavelet basis filters, being sensitive to both different directions and band-passes. This filter set overcomes the over-smoothing behavior of conventional graph neural networks. To further improve the model's performance, we add a function that perturbs the feature maps in the last layer ahead of fully connected layers, forcing the network to learn more discriminative features overall. The resulting correspondence maps show state-of-the-art performance on the benchmark datasets based on average geodesic errors and superior robustness to discretization in 3D meshes. Our approach provides new insights and practical solutions to the dense shape correspondence research.
translated by 谷歌翻译